Introduction

To investigate ride behavior differences between casual and member users and uncover temporal and spatial patterns in ride activity, a comprehensive and well-structured database is essential. The analysis focuses on understanding how ride patterns vary across time—daily, weekly, and seasonally—and space—stations and routes—while identifying trends in ride duration, station popularity, and overall demand. These insights are critical for guiding Divvy’s operational decisions and marketing strategies.

The source data for this project consists of 12 monthly Divvy trip datasets for the year 2024, containing ride-level information such as ride identifiers, timestamps, start and end stations, and user type (casual vs. member). To efficiently support analysis, a relational database will be designed to:

  • Consolidate the monthly datasets into a single, queryable structure.
  • Maintain data integrity with primary keys and appropriate data types for timestamps, text fields, and identifiers.
  • Enable temporal analysis by storing ride start and end times in a standardized timestamp format.
  • Support spatial analysis by including station names and IDs, allowing examination of station popularity and route patterns.
  • Facilitate user segmentation by distinguishing between casual and member riders.

By implementing this database, analysts will be able to efficiently query and aggregate data, uncover patterns in ride behavior, and generate actionable insights for Divvy’s operational planning and marketing initiatives.


Database Creation

Database connection

This section establishes a secure connection to the PostgreSQL database that will be used throughout the analysis. The connection details, such as host, database name, username, password, and port, are stored in a configuration file (db_config.ini) for better security and separation from the code. The connection attempt is wrapped in a tryCatch block so that any errors produce a clear and informative message rather than stopping execution silently. Once the connection is successfully created, it is registered with knitr, which allows all subsequent SQL chunks in the R Markdown document to automatically use this connection without explicitly passing it each time.

# Read config
config <- read.ini("resources/db_config.ini")
db <- config$postgresql

# Safe database connection
tryCatch({
  con <- dbConnect(
    Postgres(),
    host = db$host,
    dbname = db$database,
    user = db$user,
    password = db$password,
    port = as.integer(db$port)
  )
}, error = function(e) {
  stop("Database connection failed: ", e$message)
})

# Register connection for SQL chunks
knitr::opts_chunk$set(connection = con)

Database schema

Before loading any data, the code ensures that a dedicated schema named divvy exists in the PostgreSQL database. A schema provides a logical container for all tables, indexes, and views related to this project, keeping Divvy trip data organized and separate from other unrelated datasets. By using the IF NOT EXISTS clause, the code avoids errors if the schema is already present, ensuring idempotency and safe re-runs of the script. Create database schema to the database

CREATE SCHEMA IF NOT EXISTS divvy;

Database tables

Read data into R environment

This part of the workflow brings the raw Divvy trip data for the year 2024 into R. Each month’s data is stored in a CSV file, and the code dynamically generates file paths for all 12 months. Using a loop, the files are read into R one by one and saved into a list object named divvy_data. Each dataset is keyed by its corresponding month (e.g., "january", "february", etc.), which makes it easy to reference and process each month individually later in the pipeline. At this point, the raw data is prepared in memory and ready for structured loading into the database.

# Define year and months
year <- "2024"
months <- sprintf("%02d", 1:12)

# Initialize list to store data
divvy_data <- list()

# Loop through months and read each file
for (m in months) {
  file_path <- paste0("resources/data/", year, m, "-divvy-tripdata.csv")
  month_name <- tolower(format(as.Date(paste0(year, "-", m, "-01")), "%B"))
  
  divvy_data[[month_name]] <- read_csv(file_path, show_col_types = FALSE)
}

Create tables and load data for every month

Once the monthly data is loaded into R, the next step is to create corresponding tables in the PostgreSQL database. For each month, any existing version of the table is dropped to avoid conflicts. The raw data is first written into a temporary staging table, which serves as a safe space for cleaning and transformation. From this staging table, a final monthly table is created with duplicate rides removed by keeping only the most recent entry for each ride_id. A primary key constraint is then added to enforce ride uniqueness at the database level. Finally, the staging table is dropped to conserve storage and maintain a clean schema. This process ensures that each monthly table is consistent, deduplicated, and properly constrained.

# Months we want to load
months <- names(divvy_data)   # "january", "february", ... "december"

for (month_name in months) {
  
  # Drop old table if exists
  dbExecute(con, glue("DROP TABLE IF EXISTS divvy.{month_name} CASCADE;"))
  
  # Write raw data into a temp table (staging area)
  dbWriteTable(
    conn      = con,
    name      = DBI::Id(schema = "divvy", table = paste0(month_name, "_staging")),
    value     = divvy_data[[month_name]],
    overwrite = TRUE,
    row.names = FALSE
  )
  
  # Create final deduplicated + constrained table in one step
  dbExecute(con, glue("
    CREATE TABLE divvy.{month_name} AS
    SELECT DISTINCT ON (ride_id) *
    FROM divvy.{month_name}_staging
    ORDER BY ride_id, ended_at DESC;
  "))
  
  # Add constrained table in one step
  dbExecute(con, glue("
    ALTER TABLE divvy.{month_name}
      ADD CONSTRAINT {month_name}_pk PRIMARY KEY (ride_id);
  "))
  
  # Drop the staging table
  dbExecute(con, glue("DROP TABLE divvy.{month_name}_staging;"))
}

Data cleaning

Fix data types

To ensure consistent and accurate analysis, this step enforces correct data types on time-related fields. The columns started_at and ended_at are explicitly cast to the TIMESTAMP type. This guarantees that all time-based operations, such as extracting hours or calculating trip durations, will work reliably and without type conversion errors.

for (month_name in months) {
  fix_data_types_sql <- glue("
    ALTER TABLE divvy.{month_name}
      ALTER COLUMN started_at TYPE TIMESTAMP USING started_at::timestamp,
      ALTER COLUMN ended_at TYPE TIMESTAMP USING ended_at::timestamp;
  ")
  dbExecute(con, fix_data_types_sql)
}

Handle NULL values

Missing values in the dataset can cause errors during analysis or create misleading results. To address this, the code replaces NULL values in station name and station ID columns with sensible defaults. Unknown station names are labeled as "Unknown", while missing IDs are set to "NA". This ensures that every row has valid entries for key fields, simplifying aggregations, joins, and groupings later on.

for (month_name in months) {
  handle_null_sql <- glue("
    UPDATE divvy.{month_name}
    SET start_station_name = COALESCE(start_station_name, 'Unknown'),
        end_station_name   = COALESCE(end_station_name, 'Unknown'),
        start_station_id   = COALESCE(start_station_id, 'NA'),
        end_station_id     = COALESCE(end_station_id, 'NA');
  ")
  dbExecute(con, handle_null_sql)
}

Standardize station names

Station names often contain inconsistencies such as extra spaces or variations in capitalization. This step cleans those fields by trimming whitespace and converting names to proper case formatting. The result is a consistent representation of station names across all records, which prevents problems when grouping or filtering data by station and improves overall readability of the dataset.

for (month_name in months) {
  station_names_sql <- glue("
    UPDATE divvy.{month_name}
    SET start_station_name = INITCAP(TRIM(start_station_name)),
        end_station_name   = INITCAP(TRIM(end_station_name));
  ")
  dbExecute(con, station_names_sql)
}

Feature engineering

Add extra features

To enrich the dataset for analysis, new columns are added to capture additional information about each ride. These features include trip duration in minutes (ride_length_min), the day of the week when the ride started (day_of_week), whether the ride occurred on a weekend (is_weekend), and the hour of the day when the ride began (start_hour). These engineered features make it possible to perform richer time-based, behavioral, and segmentation analyses without modifying the raw data.

for (month_name in months) {
  extra_feature_sql <- glue("
    ALTER TABLE divvy.{month_name}
      ADD COLUMN IF NOT EXISTS ride_length_min NUMERIC,
      ADD COLUMN IF NOT EXISTS day_of_week TEXT,
      ADD COLUMN IF NOT EXISTS is_weekend BOOLEAN,
      ADD COLUMN IF NOT EXISTS start_hour INT;
  ")
  dbExecute(con, extra_feature_sql)
}

Update after adding extra features

After adding the new columns, this step populates them with values derived from existing data. The day_of_week is extracted from the start time, and is_weekend is set to true for rides taken on Saturdays and Sundays. The start_hour is calculated from the ride’s start time, allowing for hourly usage analysis. Finally, the ride_length_min column is computed as the difference between the ride’s start and end times, converted into minutes. Together, these transformations create a dataset that is far more informative and ready for exploratory analysis.

for (month_name in months) {
  update_feature_sql <- glue("
    UPDATE divvy.{month_name}
    SET day_of_week = TRIM(TO_CHAR(started_at, 'Day')),
        is_weekend  = EXTRACT(ISODOW FROM started_at) IN (6,7),
        start_hour  = EXTRACT(HOUR FROM started_at),
        ride_length_min = EXTRACT(EPOCH FROM (ended_at - started_at)) / 60.0;
  ")
  dbExecute(con, update_feature_sql)
}

Combine tables

The final step consolidates all 12 monthly tables into one master table called all_trips. If an older version of the table exists, it is dropped to ensure a fresh build. The new table is then created by combining all the monthly tables using UNION ALL, which preserves all rows from each month without removing duplicates. The result is a comprehensive dataset that covers the entire year of 2024 in a single table. Having one unified table simplifies queries, reporting, and advanced analytics, since analysts can work with one central source rather than juggling multiple monthly datasets.

# Drop if already exists, then create fresh table
dbExecute(con, glue("DROP TABLE IF EXISTS divvy.all_trips CASCADE;"))
## [1] 0
all_trips_query <- paste0(
  "CREATE TABLE IF NOT EXISTS divvy.all_trips AS\n",
  paste(
    sprintf("SELECT * FROM divvy.%s", months),
    collapse = "\nUNION ALL\n"
  )
)
dbExecute(con, all_trips_query)
## [1] 5860568

Database indexes

To improve query performance on the divvy.all_trips table, several indexes have been created. These indexes are aligned with the analytical views, ensuring that frequent filtering, grouping, and aggregation operations can run efficiently at scale.

Weekly patterns (day of week)

CREATE INDEX idx_all_trips_dow_member
ON divvy.all_trips ((EXTRACT(DOW FROM started_at)), member_casual);

Seasonal categorization

-- Expression index on month for faster season grouping
CREATE INDEX idx_all_trips_month_member
ON divvy.all_trips ((EXTRACT(MONTH FROM started_at)), member_casual);

Roundtrips

-- Index for station comparisons and member type
CREATE INDEX idx_all_trips_start_end_member
ON divvy.all_trips (start_station_id, end_station_id, member_casual);

Peak usage (hour of day)

-- Expression index on hour of day
CREATE INDEX idx_all_trips_hour_member
ON divvy.all_trips ((EXTRACT(HOUR FROM started_at)), member_casual);

Member-to-casual ratios

-- Index on ride_date for daily grouping
CREATE INDEX idx_all_trips_date_member
ON divvy.all_trips (DATE_TRUNC('day', started_at), member_casual);

Time-based analysis

The index on started_at (idx_all_trips_started_at) is designed to speed up time-based queries. Many of the analytical views, such as vw_daily_trends, vw_hourly_usage, and vw_monthly_trends, rely on filtering, grouping, or truncating ride start times. By indexing this column, the database can quickly locate rows that fall within a given date or time range, which is particularly important for large datasets covering multiple years.

CREATE INDEX idx_all_trips_started_at ON divvy.all_trips(started_at);

User segmentation

The index on member_casual (idx_all_trips_member_casual) supports queries that segment rides by user type. Views like vw_user_type, vw_daily_trends, and vw_monthly_trends frequently group or filter by this column. Since it has relatively low cardinality (only two values: “member” and “casual”), this index is lightweight but still useful when combined with other indexed fields in composite indexes.

CREATE INDEX idx_all_trips_member_casual ON divvy.all_trips(member_casual);

Stations

Indexes on start_station_id and end_station_id (idx_all_trips_start_station and idx_all_trips_end_station) are intended to optimize spatial analysis. These columns are frequently used in views like vw_station_popularity and vw_routes, where queries involve grouping by station or counting trips associated with specific locations. With these indexes in place, station-based lookups and aggregations become faster, especially when analyzing popular stations or routes.

CREATE INDEX idx_all_trips_start_station ON divvy.all_trips(start_station_id);
CREATE INDEX idx_all_trips_end_station ON divvy.all_trips(end_station_id);

Composite indexes (multi-column)

Two multi-column indexes are introduced to support queries that involve both segmentation and time or both station identifiers.

The idx_trips_user_date index combines member_casual and started_at, which is especially beneficial for queries in vw_daily_trends and vw_monthly_trends. These queries often group data by both user type and time, so the composite index allows for efficient retrieval across both dimensions simultaneously.

CREATE INDEX idx_trips_user_date ON divvy.all_trips(member_casual, started_at);

The idx_trips_routes index combines start_station_id and end_station_id, aligning directly with the vw_routes view. Since this view summarizes trips between pairs of stations, the composite index accelerates grouping and aggregation by route.

CREATE INDEX idx_trips_routes ON divvy.all_trips(start_station_id, end_station_id);

Ride duration

The index on ride duration (idx_all_trips_duration) is created using an expression that calculates the difference between ended_at and started_at in minutes. This supports queries that filter or group based on ride length, such as identifying unusually long or short rides. Although expression indexes are more specialized, they can significantly improve performance for duration-based analysis, which otherwise requires recalculating values for every query execution.

CREATE INDEX idx_all_trips_duration ON 
divvy.all_trips((EXTRACT(EPOCH FROM (ended_at - started_at)) / 60));

Updates Postgres statistics

ANALYZE divvy.all_trips;

Database views

The following views provide a structured way to analyze Divvy bike-share data for 2024. They are designed to answer specific analytical and business questions about usage patterns, user behavior, and demand across time and space. Each view builds on the consolidated all_trips table and serves as a reusable component for reporting and exploratory analysis.

Master view

The vw_all_trips view acts as the primary entry point for analyzing the dataset. It is simply a direct reference to the all_trips table, ensuring consistency when working with derived views or queries. This provides analysts with a reliable way to access the full dataset without repeatedly referencing the base table.

CREATE OR REPLACE VIEW divvy.vw_all_trips AS
SELECT * FROM divvy.all_trips;

User type comparison

This view provides a quick comparison of ride behavior between casual riders and members. It summarizes the number of rides taken by each user type and calculates their average ride duration in minutes. The purpose of this view is to highlight differences in usage patterns, which can inform marketing strategies, membership growth initiatives, and operational planning.

CREATE OR REPLACE VIEW divvy.vw_user_type AS
SELECT
  member_casual,
  COUNT(*) AS total_rides,
  AVG(EXTRACT(EPOCH FROM (ended_at - started_at)) / 60) avg_duration_min
FROM divvy.all_trips
GROUP BY member_casual;

Hourly usage

This view analyzes rides by hour of the day and distinguishes between casual and member users. It provides counts of rides for each hour, enabling insights into peak demand periods such as commuting hours or recreational usage times. The results from this view are helpful for understanding daily demand cycles and planning resource allocation like bike availability or re-balancing.

CREATE OR REPLACE VIEW divvy.vw_hourly_usage AS
SELECT 
  EXTRACT(HOUR FROM started_at) AS hour_of_day,
  member_casual,
  COUNT(*) AS rides
FROM divvy.all_trips
GROUP BY EXTRACT(HOUR FROM started_at), member_casual;

Station popularity

The vw_station_popularity view ranks stations based on the number of departures. It identifies the most frequently used starting locations by counting departures per station and assigning a popularity rank. This analysis helps uncover hotspots in the network and provides guidance for station placement, expansion, or infrastructure improvements.

CREATE OR REPLACE VIEW divvy.vw_station_popularity AS
SELECT 
  start_station_id,
  start_station_name,
  COUNT(*) AS departures,
  RANK() OVER (ORDER BY COUNT(*) DESC) AS popularity_rank
FROM divvy.all_trips
GROUP BY start_station_id, start_station_name;

Routes analysis

This view summarizes trips between pairs of stations, providing the total number of rides for each route. It enables the identification of the most popular travel flows and routes across the network. Such insights are valuable for planning route optimization, understanding commuting patterns, and supporting operational decisions like station balancing or targeted promotions.

CREATE OR REPLACE VIEW divvy.vw_routes AS
SELECT 
  start_station_id,
  start_station_name,
  end_station_id,
  end_station_name,
  COUNT(*) AS trip_count
FROM divvy.all_trips
GROUP BY start_station_id, start_station_name, end_station_id, end_station_name;

Weekly ride patterns (day of week)

CREATE OR REPLACE VIEW divvy.vw_weekly_patterns AS
SELECT
    member_casual,
    EXTRACT(DOW FROM started_at) AS day_of_week, -- 0=Sunday … 6=Saturday
    COUNT(*) AS total_rides,
    ROUND(AVG(EXTRACT(EPOCH FROM (ended_at - started_at)) / 60), 2) AS avg_duration_min
FROM divvy.all_trips
GROUP BY member_casual, EXTRACT(DOW FROM started_at)
ORDER BY member_casual, day_of_week;

Seasonal categorization

CREATE OR REPLACE VIEW divvy.vw_seasonal_trends AS
SELECT
    member_casual,
    CASE
        WHEN EXTRACT(MONTH FROM started_at) IN (12, 1, 2) THEN 'Winter'
        WHEN EXTRACT(MONTH FROM started_at) IN (3, 4, 5) THEN 'Spring'
        WHEN EXTRACT(MONTH FROM started_at) IN (6, 7, 8) THEN 'Summer'
        WHEN EXTRACT(MONTH FROM started_at) IN (9, 10, 11) THEN 'Fall'
    END AS season,
    COUNT(*) AS total_rides,
    ROUND(AVG(EXTRACT(EPOCH FROM (ended_at - started_at)) / 60), 2) AS avg_duration_min
FROM divvy.all_trips
GROUP BY member_casual, season
ORDER BY member_casual, season;

Roundtrips vs. one-way trips

CREATE OR REPLACE VIEW divvy.vw_roundtrips AS
SELECT
    member_casual,
    CASE WHEN start_station_id = end_station_id THEN 'Roundtrip'
         ELSE 'One-way'
    END AS trip_type,
    COUNT(*) AS total_rides,
    ROUND(AVG(EXTRACT(EPOCH FROM (ended_at - started_at)) / 60), 2) AS avg_duration_min
FROM divvy.all_trips
GROUP BY member_casual, trip_type
ORDER BY member_casual, trip_type;

Peak usage by hour

CREATE OR REPLACE VIEW divvy.vw_peak_usage AS
SELECT
    member_casual,
    EXTRACT(HOUR FROM started_at) AS hour_of_day,
    COUNT(*) AS total_rides
FROM divvy.all_trips
GROUP BY member_casual, EXTRACT(HOUR FROM started_at)
ORDER BY member_casual, hour_of_day;

Member-to-casual ratios over time

CREATE OR REPLACE VIEW divvy.vw_member_casual_ratio AS
SELECT
    DATE_TRUNC('day', started_at) AS ride_date,
    SUM(CASE WHEN member_casual = 'member' THEN 1 ELSE 0 END) AS member_rides,
    SUM(CASE WHEN member_casual = 'casual' THEN 1 ELSE 0 END) AS casual_rides,
    ROUND(
        (SUM(CASE WHEN member_casual = 'member' THEN 1 ELSE 0 END)::NUMERIC /
         NULLIF(SUM(CASE WHEN member_casual IN ('member','casual') THEN 1 ELSE 0 END), 0)
        ) * 100, 2
    ) AS pct_member,
    ROUND(
        (SUM(CASE WHEN member_casual = 'casual' THEN 1 ELSE 0 END)::NUMERIC /
         NULLIF(SUM(CASE WHEN member_casual IN ('member','casual') THEN 1 ELSE 0 END), 0)
        ) * 100, 2
    ) AS pct_casual
FROM divvy.all_trips
GROUP BY ride_date
ORDER BY ride_date;


Exploratory Data Analysis

Data quality & preparation

Count total rides

This query calculates the total number of rides recorded in the dataset. By simply counting all rows in the divvy.all_trips table, we establish the size of the dataset and gain an overall sense of its scale. This figure provides important context for interpreting future analyses, as it allows us to measure the proportion of rides that fall into different categories and track patterns against the total ridership base.

SELECT COUNT(*) FROM divvy.all_trips;
1 records
count
5860568

Check for missing values

This query examines the dataset for missing or null entries in key fields such as start_station_name, end_station_name, and rideable_type. By summing all the null values in each of these columns, we can identify the extent to which the dataset is complete or suffers from data gaps. Missing values may arise from incomplete trip records, technical errors, or limitations in the data collection system. Understanding how widespread missing values are is essential, since they can bias results and must often be handled through cleaning, filtering, or imputation before further analysis.

SELECT
  SUM(CASE WHEN start_station_name IS NULL THEN 1 ELSE 0 END) 
  AS missing_start_station,
  SUM(CASE WHEN end_station_name IS NULL THEN 1 ELSE 0 END) 
  AS missing_end_station,
  SUM(CASE WHEN rideable_type IS NULL THEN 1 ELSE 0 END) 
  AS missing_ride_type
FROM divvy.all_trips;
1 records
missing_start_station missing_end_station missing_ride_type
0 0 0

Check duplicates

This query investigates whether the dataset contains duplicate records based on the ride_id field, which is supposed to uniquely identify each trip. By grouping rides by their IDs and checking for counts greater than one, it highlights any instances where the same ride appears multiple times. Detecting duplicates is an important part of data quality assurance, as repeated records can inflate counts and distort statistical measures if not addressed properly.

SELECT ride_id, COUNT(*) 
FROM divvy.all_trips
GROUP BY ride_id
HAVING COUNT(*) > 1;
Displaying records 1 - 10
ride_id count
F74208AD3A0B73CB 2
DF457AB27F30DC3D 2
0354FD0756337B59 2
7FD8876225A67920 2
CF25E56A43052CFF 2
44CC4F2EBA96A21E 2
29F756066AB38269 2
3CE5E2EB3B4E1999 2
439F3F47987B3C29 2
07DBFDA3C91006AE 2

Trip duration

Summary stats

This query provides descriptive statistics on trip durations while filtering out extreme outliers, keeping only rides that last between one minute and twenty-four hours. It calculates the average, median, and 90th percentile ride length for both members and casual riders. These summary statistics are useful for comparing behavioral patterns between the two groups, such as whether casual riders tend to take longer trips than members. Filtering the data ensures that anomalies, like extremely short or excessively long rides, do not skew the results.

SELECT member_casual,
  AVG(ride_length_min) AS avg_duration,
  PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY ride_length_min) 
  AS median_duration,
  PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY ride_length_min) 
  AS perc90_duration
FROM divvy.all_trips
WHERE ride_length_min BETWEEN 1 AND 1440  -- filter outliers (1 min – 24 hrs)
GROUP BY member_casual;
2 records
member_casual avg_duration median_duration perc90_duration
casual 21.56863 12.43048 43.70709
member 12.40836 8.84685 24.07167

Station & route popularity

Top 10 start stations

This query identifies the ten most frequently used starting stations by counting the number of trips that begin at each location. Ranking stations by ride volume highlights the areas of highest demand within the network. These stations often serve as transportation hubs, tourist destinations, or dense residential and commercial areas. The results provide a foundation for planning bike availability, expanding infrastructure, or improving service in the most active locations.

SELECT start_station_name, COUNT(*) AS rides
FROM divvy.all_trips
GROUP BY start_station_name
ORDER BY rides DESC
LIMIT 10;
Displaying records 1 - 10
start_station_name rides
Unknown 1073951
Streeter Dr & Grand Ave 66020
Dusable Lake Shore Dr & Monroe St 43969
Kingsbury St & Kinzie St 39659
Michigan Ave & Oak St 39634
Dusable Lake Shore Dr & North Blvd 39621
Clark St & Elm St 35543
Clinton St & Washington Blvd 34512
Millennium Park 33169
Clinton St & Madison St 33024

Top 10 most common routes

This query determines the most frequently traveled routes by analyzing pairs of starting and ending stations. By counting and ranking the number of rides for each station-to-station connection, it identifies the most popular travel corridors in the system. These routes can indicate regular commuting paths or heavily visited leisure destinations. The results are useful for understanding rider preferences, network efficiency, and areas where targeted operational support may be needed.

SELECT start_station_name, end_station_name, COUNT(*) AS rides
FROM divvy.all_trips
GROUP BY start_station_name, end_station_name
ORDER BY rides DESC
LIMIT 10;
Displaying records 1 - 10
start_station_name end_station_name rides
Unknown Unknown 526345
Streeter Dr & Grand Ave Streeter Dr & Grand Ave 9668
Dusable Lake Shore Dr & Monroe St Dusable Lake Shore Dr & Monroe St 7917
Calumet Ave & 33rd St State St & 33rd St 5921
State St & 33rd St Calumet Ave & 33rd St 5913
Dusable Lake Shore Dr & Monroe St Streeter Dr & Grand Ave 5692
Ellis Ave & 60th St Ellis Ave & 55th St 5437
Michigan Ave & Oak St Michigan Ave & Oak St 5262
Ellis Ave & 60th St University Ave & 57th St 5187
Ellis Ave & 55th St Ellis Ave & 60th St 5137

Rideable types

Bike type usage by membership

This query examines how different types of bikes are used by members versus casual riders. By grouping rides by both membership status and rideable_type, it counts how many rides fall into each category. This analysis highlights preferences for specific vehicle types, such as whether casual users favor electric bikes while members prefer classic bikes. Understanding these patterns can help inform operational decisions, fleet composition, and marketing strategies tailored to user groups.

SELECT member_casual, rideable_type, COUNT(*) AS rides
FROM divvy.all_trips
GROUP BY member_casual, rideable_type;
6 records
member_casual rideable_type rides
casual classic_bike 974966
casual electric_bike 1091477
casual electric_scooter 85215
member classic_bike 1760670
member electric_bike 1889118
member electric_scooter 59122

Demand & operational insights

Stations with biggest inflow-outflow imbalance

This query identifies the stations with the largest mismatch between departures and arrivals. It calculates a “net outflow” by subtracting the number of trips ending at a station from the number of trips starting there. Stations with high positive values experience more departures than arrivals, leading to potential bike shortages, while negative values indicate stations where too many bikes accumulate. These results are crucial for operational planning, as they highlight the locations most in need of rebalancing to maintain service reliability and meet rider demand.

SELECT start_station_name AS station,
       COUNT(*) FILTER (WHERE start_station_name IS NOT NULL) -
       COUNT(*) FILTER (WHERE end_station_name IS NOT NULL) AS net_outflow
FROM divvy.all_trips
GROUP BY station
ORDER BY net_outflow DESC
LIMIT 10;
Displaying records 1 - 10
station net_outflow
21st St & Pulaski Rd 0
63rd St Beach 0
900 W Harrison St 0
Aberdeen St & 103rd St 0
Aberdeen St & Jackson Blvd 0
Aberdeen St & Monroe St 0
Aberdeen St & Randolph St 0
Ada St & 113th Place 0
Ada St & 113th St 0
2112 W Peterson Ave 0


Conclusion